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Abstract 

Spectral graph theory-based methods represent an important 
class of tools for studying the structure of networks. Spec¬ 
tral methods are based on a first-order Markov chain de¬ 
rived from a random walk on the graph and thus they cannot 
take advantage of important higher-order network substruc¬ 
tures such as triangles, cycles, and feed-forward loops. Here 
we propose a Tensor Spectral Clustering (TSC) algorithm 
that allows for modeling higher-order network structures in a 
graph partitioning framework. Our TSC algorithm allows the 
user to specify which higher-order network structures (cy¬ 
cles, feed-forward loops, etc.) should be preserved by the 
network clustering. Higher-order network structures of in¬ 
terest are represented using a tensor, which we then partition 
by developing a multilinear spectral method. Our framework 
can be applied to discovering layered flows in networks as 
well as graph anomaly detection, which we illustrate on syn¬ 
thetic networks. In directed networks, a higher-order struc¬ 
ture of particular interest is the directed 3-cycle, which cap¬ 
tures feedback loops in networks. We demonstrate that our 
TSC algorithm produces large partitions that cut fewer di¬ 
rected 3-cycles than standard spectral clustering algorithms. 

1 Introduction 

Spectral graph methods investigate the structure of networks 
by studying the eigenvalues and eigenvectors of matrices as¬ 
sociated to the graph, such as its adjacency matrix or Lapla- 
cian matrix. Arguably the most important spectral graph 
algorithms are the spectral graph partitioning methods that 
identify partitions of nodes into low conductance commu¬ 
nities in undirected networks [1]. While the simple matrix 
computations and strong mathematical theory behind spec¬ 
tral clustering methods makes them appealing, the methods 
are inherently limited to two-dimensional structures, for ex¬ 
ample, undirected edges connecting pairs nodes. Thus, it is 
a natural question whether spectral methods can be general¬ 
ized to higher-order network structures. For example, tradi¬ 
tional spectral clustering attempts to minimize (appropriately 
normalized) number of first-order structures (i.e., edges) that 
need to be cut in order to split the graph into two parts. In a 
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similar spirit, a higher-order generalization of spectral clus¬ 
tering would try to minimize cutting higher-order structures 
that involve multiple nodes (e.g., triangles). 

Incorporating higher-order graph information (that is, 
network motifs/graphlets) into the partitioning process can 
significantly improve our understanding of the underlying 
network. For example, triangles (three-dimensional network 
structures involving three nodes) have proven fundamental to 
understanding social networks [14, 21] and their community 
structure [10, 26, 29]. Most importantly, higher-order spec¬ 
tral clustering would allow for greater modeling flexibility 
as the application would drive which higher-order network 
structures should be preserved by the network clustering. For 
example, in financial networks, directed cycles might indi¬ 
cate money laundering and higher-order spectral clustering 
could be used to identify groups of nodes that participate 
in such directed cycles. As directed cycles involve multi¬ 
ple edges, current spectral clustering tools would not be able 
to identify groups with such structural signatures. 

Generalizing spectral clustering to higher-order struc¬ 
tures involves several challenges. The essential challenge 
is that higher-order structures are often encoded in tensors, 
i.e., multi-dimensional matrices. Even simple computations 
with tensors lack the traditional algorithmic guarantees of 
two-dimensional matrix computations such as existence and 
known runtimes. For instance, eigenvectors are a key com¬ 
ponent to spectral clustering, and finding tensor eigenvectors 
is NP-hard [15]. An additional challenge is that the number 
of higher-order structures increases exponentially with the 
size of the structure. For example, in a graph with n nodes, 
the number of possible triangles is 0{n^). However, real- 
world networks have far fewer triangles. 

While there exist several extensions to the spectral 
method, including the directed Laplacian [5], the asymmet¬ 
ric Laplacian [4], and co-clustering [9, 28], these methods 
are all limited to two-dimensional graph representations. A 
simple work-around would be to weight edges that occur in 
higher-order structures [19]. However, this heuristic is un¬ 
satisfactory because the optimization is still on edges, and 
not on the higher-order patterns we aim to cluster. 

Here, we propose a Tensor Spectral Clustering (TSC) 
framework that is directly based on higher-order network 
structures, i.e., network information beyond edges connect¬ 
ing two nodes. Our framework operates on a tensor of net¬ 
work data and allows the user to specify which higher-order 



network structures (cycles, feed-forward loops, etc.) should 
be preserved by the clustering. For example, if one aims to 
obtain a partitioning that does not cut triangles, then this can 
be encoded in a third-order tensor T_, where !](/, j, k) is equal 
to 1 if nodes i, j, and k form a triangle and 0 otherwise. 

Given a tensor representation of the desired higher-order 
network structures, we then use a mutlilinear PageRank vec¬ 
tor [13] to reduce the tensor to a two-dimensional matrix. 
This dimensionality reduction step allows us to use effi¬ 
cient matrix algorithms while approximately preserving the 
higher-order structures represented by the tensor. Our result¬ 
ing TSC algorithm is a spectral method that partitions the 
network to minimize the number of higher-order structures 
cut. This way our algorithm finds subgraphs that contain 
many instances of the higher-order structure described by the 
tensor. Figure 1 illustrates a directed network, and our goal 
is to identify clusters of directed 3-cycles. That is, we aim 
to partition the nodes into two sets such that few directed 3- 
cycles get cut. Our TSC algorithm finds a partition that does 
not cut any of the directed 3-cycles, while a standard spectral 
partitioner (the directed Laplacian [5]) does. 

Clustering networks based on higher-order structures 
has many applications. For example, the TSC algorithm 
allows for identifying layered flows in networks, where 
the network consists of several layers that contain many 
feedback loops. Between layers, there are many edges, 
but they flow in one direction and do not contribute to 
feedback. We identify such layers by clustering a tensor 
that describes small feedback loops {e.g., directed 3-cycles 
and reciprocated edges). Similarly, TSC can be applied to 
anomaly detection in directed networks, where the tensor 
encodes directed 3-cycles that have no reciprocated edges. 
Our TSC algorithm can find subgraphs that have many 
instances of this pattern, while other spectral methods fail 
to capture these higher-order network structures. 

Our contributions are summarized as follows: 

• In Sec. 3, we develop a tensor spectral clustering frame¬ 
work that computes directly on higher-order graph 
structures. We provide theoretical justifications for our 
framework in Sec. 4. 

• In Sec. 5, we provide two applications—layered 
flow networks and anomaly detection—where our ten¬ 
sor spectral clustering algorithm outperforms standard 
spectral clustering on small, illustrative networks. 

• In Sec. 6, we use tensor spectral clustering to parti¬ 
tion large networks so that directed 3-cycIes are not cut. 
This provides additional empirical evidence that our al¬ 
gorithm out-performs state-of-the-art spectral methods. 


Tensor spectral clustering: 

[0, 1,2], [3,4,5] 

Directed Laplacian: 

[1,2,5], [0, 3,4] 

Figure 1: (Left) Network where directed 3-cycles only ap¬ 
pear within the blue or red nodes. (Right) Partitioning found 
by our proposed tensor spectral clustering algorithm and the 
directed Laplacian. Our proposed algorithm doesn’t cut any 
directed 3-cycles. Directed 3-cycles are just one higher-order 
structure that can be used within our framework. 

2 Preliminaries and background 

We now review spectral clustering and conductance cut. The 
key ideas are a Markov chain representing a random walk 
on a graphs, a second left eigenvector of the Markov chain, 
and a sweep cut that uses the ordering of the eigenvector to 
compute conductance scores. In Sec. 3, we generalize these 
ideas to tensors and higher-order structures on graphs. 

2.1 Notation and the transition matrix Consider an 

undirected, weighted graph G - (V,E), where n = |y| and 
m = ILI. Let A £ be the weighted adjacency matrix of 
G, i.e., Aij - Wij if {i,j) e E and A;y = 0 otherwise. Let D 
be the diagonal matrix with generalized degrees of the ver¬ 
tices of G. In other words, D = diag (Ae), where e is the 
vector of all ones. The combinatorial Laplacian or Kirchoff 
matrix is K - D - A. The matrix P - ' is a column 

stochastic matrix, which we call the transition matrix. We 
now interpret this matrix as a Markov chain. 

2.2 Markov chain interpretation Since P is column 
stochastic, we can interpret the matrix as a Markov chain 
with states S ,, for each time step t. Specifically, the states of 
the Markov chain are the vertices on the graph, i.e., St e V. 
The transition probabilities are given by P: 

Prob(5,+i = ! I = ;■) = Pij ^AjilDjj. 

This Markov chain represents a random walk on the 
graph G. In Sec. 3.2, we will generalize this idea to 
tensors of graph data. We now show how the second left 
eigenvector of the Markov chain described here is key to 
spectral clustering. 

2.3 Second left eigenvector for conductance cut The 

conductance of a set 5 c V of nodes is 



Code used for this paper is available at 
https://github.com/arbenson/tensor-sc, and 
all networks used in experiments are available from 
SNAP [23]. 


(2.1) <p{S) - cut(5)/min(vol(5),vol(5)), 

where cut(5) = |((m,c) I M6 5,t;£5]|, and vol(5) = 
|{(m, c) I M £ 5]|. Small conductance indicates a good par- 



tition of the graph: the number of cut edges must be small 
and neither S nor S can be too small. Let z e {-1,1)" be an 
indicator vector over the nodes in G, where z, = 1 if the /th 
node is in 5. Then 

(2.2) z^Kz = ^ 41 (z; = z;) oc cut (S). 

(iJ)eE 


The conductance cut eigenvalue problem is an approxi¬ 
mation for the NP-hard problem of minimizing conductance: 


(2.3) 


minimize z^KzIz^Dz 

zeR" 

subject to e^Dz - 0, ||z|| = 1 


The idea of the real-valued relaxation in Eqn. (2.3) is 
that positive and negative values of z correspond to the +1 
indicator vector for the cut in Eqn. (2.2). In Sec. 2.4 we will 
review how to convert the real-valued solution to a cut. 

The matrices K and D are positive semi-definite, and 
Eqn. (2.3) is a generalized eigenvalue problem. In particular, 
the solution is the vector z such that Kz - ADz, where A 
is the second smallest generalized eigenvalue (the smallest 
eigenvalue is 0 and corresponds to the trivial solution z = e). 
To get the solution z, we observe that 

Kz^ADz ^ (I-D-^A)z^Az 
^ z^P = (l-4)z^ 


where 1 - T is the second largest left eigenvalue of P. We 
know that P - e^, so we are looking for the dominant left 
eigenvector that is orthogonal to the trivial one. 

Here, we call the above partitioning algorithm for undi¬ 
rected graphs the “undirected Laplacian” method. One gen¬ 
eralization to directed graphs is due to Chung [5]. Eor this 
method, we use the undirected Laplacian method on the 
following symmetrized network: Asym ■- j -t- PIl), 
where P - A^D ^ and 11 = diag in) for Pn - n, the station¬ 
ary distribution of P. Note that Dsym - diag (^Asy,„e^ = II, so 
we are interested in the second left eigenvector of 

(2.4) p,y^ = i (np^n-' + p). 

By “directed Laplacian”, we refer to the method that uses the 
second left eigenvector of Psym- 


2.4 Sweep cuts In order to round the real-valued solution 
z to a solution set S to evaluate Eqn. (2.1), we sort the 
vertices by the values z; and consider vertex sets Sk that 
consist of the first k nodes in the sorted vertex list. In 
other words, if cr, is equal to the index of the /th smallest 
element of z, then Sk - {(Ti,cr 2 ,. ■ - crk]. We then choose 
S = argmin^j 0(5<:). The set of nodes S satisfies the 
celebrated Cheeger inequality [1]: 0^/2 < (piS) < 20», 


where 0, is the minimum conductance over all cuts. The 
sweep cut computation is fast, since S k+i differs from Skhy 
only one node, and the sequence of scores (p(S i),..., 
can be computed in 0{n + m) time. 

In addition to conductance, other scores can also 
be computed in the same sweeping fashion. Of par¬ 
ticular interest are the normalized cut, ncut(S) = 
cut(5)(l/vol(5)-H l/voI(5)), and the expansion, p(S) - 
cut (5) / min(|5|, |5|). The normalized cut differs by at most 
a factor of two from conductance, so we will limit ourselves 
to conductance and expansion in this paper. 

3 Tensor spectral clustering framework 

The key ingredients for spectral clustering discussed in 
Sec. 2 were a transition matrix from an undirected graph, 
a Markov chain interpretation of the transition matrix, and 
the second left eigenvector of the Markov chain. We now 
generalize these ideas for higher-order network structures. 

3.1 Transition tensors Our first goal is to represent the 
higher-order network stuctures of interest. Eor example, to 
represent structures on three nodes {i.e., directed cycles, or 
feed-forward loops) we required a three-dimensional tensor. 
In particular, we want a symmetric order-3 tensor P e ]^«xnxn 
such that the entry at index {i,j,k) contains information 
about nodes /, j, k e V. (Here, symmetric means that the 
value of P(/, j, k) remains the same under any permutation of 
the three indices.) A tensor describing triangles in G is: 

(3.5) P(/, j, k) = I (/, j,k eV distinct and form a triangle). 

This tensor represents third-order information about the 
graph. We form a transition tensor by 

n 

P{i, j, k) = T{i, j, k)l ^ TH, j, k), 1 < /, j, k<n. 

1=1 

In the case that = 0^ we fill in P(:, 7 , k) with 

a stochastic vector u, i.e., ^(:,j,k) - u. We call the vector 
u the dangling distribution vector, borrowing the term from 
the PageRank community [3]. Next, we see how to interpret 
this transition tensor as a second-order Markov chain. 

3.2 Second-order Markov chains and the spacey ran¬ 
dom surfer Next, we seek to generalize the Markov chain 
interpretation of spectral clustering to tensors. While spec¬ 
tral clustering on matrices is analogous to a first-order 
Markov chain, we will show that tensor spectral clustering 
is analogous to a second-order Markov chain on a matrix 
representation of the tensor. 

Entries of the transition tensor P from Sec. 3.1 can be 
interpreted as the transition probabilities of a second-order 
Markov chain. Specifically, given a second-order Markov 


chain with state space the set of vertices, V, we define the 
transition probabilities as 

P(i, j, k) = Prob \ S,^ j,S = k). 

In other words, the probability of moving to state i depends 
on the current state j and the last state k. For the triangle 
tensor in Eqn. (3.5), 

I (/, j, k form triangle) 

~ #(triangles involving nodes j and k) 

If the previous state was node k and the current state is 
node j, then, for the next state, the Markov chain chooses 
uniformly over all nodes i that form a triangle with j and k. 

The stationary distribution Xtj of the second-order 
Markov chain satisfies - ^ij- We would 

like to model the full second-order dynamics of the Markov 
chain, but doing so is computationally infeasible because 
just storing the stationary distribution requires 6>(n^) mem¬ 
ory. Instead, we will make the simplifying assumption that 
Xij - XiXj for some vector jc e R" with - 1- The 

stationary distribution then satisfies 

(3.6) ^ P{iJ,k)xjXk - Xi. 

1 < j,k<n 

With respect to Eqn. (3.6), x is called a z eigenvector of 
the tensor P with eigenvalue 1 [27]. To simplify notation, 
we will denote the one-mode unfolding of P by /? e 
namely P = |^P(:,;,1) P(;,;,2) ... P(:,:,n)j. The ma¬ 

trix P is a column stochastic matrix. We use - Pji:, ;, k) to 
denote the kth nxn block of P. With this notation, Eqn. (3.6) 
reduces to R ■ (x ® x) - x, where ® denotes the Kronecker 
product. 

The simplifying approximation Xij - XiXj is computa¬ 
tionally and algebraically appealing, but we also want a ran¬ 
dom process to interpret the vector. Recent work [ 1 3] has 
considered the multilinear pagerank vector x that satisfies 

(3.7) aR (jt (g) Jt) -H (1 - a)v - x, Xk> 0, x - 1, 


3.3 Second left eigenvector Eollowing the steps of spec¬ 
tral clustering, we now need to obtain an equivalent of the 
second left eigenvector (Sec. 2.3). In particular, we now 
show how to get a relevant eigenvector from the multilinear 
PageRank vector x and the transition tensor P. The multilin¬ 
ear PageRank vector x satisfying aR ■ (jc ® jc) -H (1 - a)v - x 
can also be re-interpreted as the stationary distribution of a 
particular Markov chain. Specifically, define the matrix 

n 

(3.8) 


(Recall that R^ — P(:,:, k) is the kth n xn block of P). The 
matrix P[jc] is column stochastic because each R^ is column 
stochastic and YIk=\ ^k-^ - Note that 


R {x®x)^^Rk (x^a:) = ^ XkRk 


k=\ 


\k=\ 


X - P[j(:] ■ X. 


Hence, x is the stationary distribution of the PageRank 
system aP{x} ■ -i- (1 - a)v - x. However, the transition 

matrix depends on x itself. 

We use the second left eigenvector of P[jc] for our 
higher-order spectral clustering algorithm. Heuristically, 
is a weighted sum of n “views” of the graph (the 
matrices R^), from the perspective of each node (k, 1 < k < 
n), according to three-dimensional graph data (the tensor P). 
If node k has a large influence on the three-dimensional data, 
then Xk will be large and we will weight data associated with 
node k more heavily. The ordering of the eigenvector will be 
used for a sweep cut on the vertices. 


3.4 Sweep cuts The last remaining step is to generalize 
the notion of the sweep cut (Sec. 2.4). Recall that the sweep 
cut takes some ordering on the nodes, cr, and computes some 
score f{Sk) for each cut Sk - {cri,..., cr^}. Einally, the 
sweep cut procedure returns arg maxj^ f{S k)- The eigenvec¬ 
tor from Sec. 3.3 provides us with an ordering for a sweep 
cut, just as in the two-dimensional case (Sec. 2.4). We gen¬ 
eralize the cut and volume measures as follows: 


for a constant a e (0,1) and stochastic vector v. 

This vector is the stationary distribution of a stochastic 
process recently termed the spacey random surfer [12]. At 
any step of the process, a random surfer has just moved from 
node k to node j. With probability (1 - a), the surfer teleports 
to a random state via the stochastic vector v. With probability 
a, the surfer wants to transition to node i with probability 
P(i, j, k). However, the surfer spaces out and forgets that s/he 
came from node k. Instead, the surfer guesses the previous 
state, based on the historical distribution over the state 
space. Eormally, the surfer guesses node i with probability 
^ (l -I- I [‘^f = ^])- It is important to note that although 
this process is an approximation to a second-order Markov 
chain, the process is no longer Markovian. 


cut3(5) 2 I(/,7;k)- ^ I(/,7,k)- ^ Tiffk) 

iJ,keV iJ,keS i,j,keS 

vol3(5) YjI.(S,V,V). 


And we define “higher-order conductance” (denoted and 
“higher-order expansion” (denoted p^) as 

CUt3(5) 


(3.9) 


(3.10) 


03 ( 5 ) 


Pi(S) 


min (vol3(5),vol3(5)) 
CUt3(5) 


mm 


in(|5|,|5|)' 


This definition ensures that 03(5) e [0,1], as in standard 
conductance. 







Algorithm 1: Tensor Spectral Partitioning 
Data: G = (V, E), |y| - n,T_e dangling 

distribution vector «, a e (0,1) 

Result: Set of nodes S <zV 
for 1 < i, j, k <n, J^i, j, k) ^ 0 do 
L P(i,j,k)^T(i,j,k)IZiL(iJ,k) 
for j, k such that 2; T{i, j, A:) = 0 do 
L P(-J^k) <- u 

X «— MultilinearPaqeRankia, P) 

Rk^Pi:,:,k) 

P[x] <- Z/fc XkRk 

Compute second left eigenvector z of /’[jc] 
cr <— sorted ordering of z 
S S weepCut(cr, G) 


Algorithm 2: Tensor Spectral Clustering (TSC) 

Data: G = (V, E), |y| - n,T_e dangling 

distribution vector u, a e (0,1), number of 
clusters C 

Result: Partition 'P of y 

if |P| < C then 

Partition G into Gi = (Pi, £ 1 ) and G 2 = (V 2 ,E 2 ) 
via Algorithm 1 . 
p = pu{yi,y2). 

Recurse on largest component in P. 


3.5 Tensor spectral clustering framework We now have 
higher-order analogs of all the spectral clustering tools from 
Sec. 2. The central routine of our tensor spectral clustering 
framework is given in Algorithm 1, which is the tensor 
spectral partitioning algorithm. This subroutine takes a 
data tensor T of third-order information about a graph G 
and partitions the nodes into two sets. Algorithm 2 is 
the clustering algorithm that performs recursive bisection in 
order to decompose the graph into several components. This 
algorithm can also be used with other partitioning algorithms 
[11], and we will take that approach in Sec. 5. 

3.6 Complexity The complexity of Algorithm 2 depends 
on the sparsity of the data tensor T, i.e.the number of higher- 
order structures in the network. The algorithm depends on 
the sparsity in three ways. First, all of the higher-order 
structures in the network must be enumerated as an upfront 
cost. Second, the sparsity affects the complexity of the 
multilinear PageRank subroutine in Algorithm 1. Third, 
the number of non-zeroes in P[x} is equal to the number 
of the higher-order structures. When performing recursive 
bisection (Algorithm 2), there is no upfront cost to enumerate 
the structures—we only need to determine which structures 
are preserved under the partition. 


We argue that the upfront cost is not cumbersome. 
Triangle enumeration for real-world undirected networks is 
a well-studied problem [7, 30]. For directed graphs, we can: 
(1) undirect the graph, (2) use high-performance code to 
enumerate the triangles, and (3) stream through the triangles 
and only keep those that are the directed structure of interest. 

Now, we consider the second and third computations. 
Let T be the number of non-zeroes in T. There are several 
methods for computing the multilinear PageRank vector in 
Algorithm 1 [13]. We use the shifted fixed point method 
(akin to the symmetric higher-order power method [20]). 
Each iteration takes 0{T) time, and we found that this 
method converges very quickly—usually within a handful of 
iterations. The computation of the second left eigenvector of 
/■[jc] dominates the running time. We use the power method 
to compute this eigenvector. Since /’[jc] has T non-zero 
entries, each iteration takes 0{T) time. 

Finally, we look at the relationship between T and the 
size of the graph. In theory, T can be 0(rP), but this is 
far from what we see in practice. For the large networks 
considered in Sec. 6,T< 6m (see Table 1). 

To summarize, the majority of our time is spent comput¬ 
ing the eigenvector of P[jc]. Each iteration takes 0(T) time, 
and T < 6m for the algorithms we consider. Standard spec¬ 
tral algorithms also compute an eigenvector with the power 
method, but each iteration is only 0(m) time. Thus, we can 
think of our algorithms as running within an order of magni¬ 
tude of standard algorithms. However, when moving beyond 
third-order structures, we note that T can be much larger. 

4 Generalizations and directed 3-cycle cut 

Before transitioning to applications, we mention two impor¬ 
tant generalities of our framework and discuss directed 3- 
cycle cuts. The directed 3-cycle will play an important role 
for our applications in Sections 5 and 6. 

4.1 Generalizations Our first generalization deals with 
data beyond three dimensions. While we have presented the 
algorithm with three-dimensional data, the same ideas carry 
through for higher-order data. The multilinear PageRank 
vector can still be computed, although a must be smaller to 
guarantee convergence [13]. However, in practice, we do not 
observe large a impeding convergence. 

Second, our TSC algorithm is a strict generalization of 
traditional spectral clustering in the following sense. There 
is a data tensor T_ such that for any multilinear PageRank 
vector jc, we compute the same eigenvector that conductance 
cut computes. In particular, we can always define T_{i, j, k) = 
Ajj, where A is the adjacency matrix. Then Rk - P, 
I < k < n, and /’[jt] = Yjk ^kP - PYjk^k - P- 

4.2 Directed 3-cycle cuts We now turn our attention to a 
particular three-dimensional representation of directed graph 









data: directed 3-cycles (D3Cs), i.e., sets of edges (/, j), (j, k), 
and {k, i) for distinct nodes i, j, and k. Such structures 
are important for community detection [19] and are natural 
motifs for network feedback. We will use this structure for 
applications in Sections 5 and 6. The data tensor we use for 
directed 3-cycle cuts is 

( 2 i, j, k form two D3Cs 
(4.11) T(i, j, k) = i 1 i,j,k form one D3C 

( 0 otherwise 

Nodes i, j, and k form two D3Cs if and only if every possible 
directed edge between them is present. When TJi, j, k) - I, 
we do not differentiate between 0, 1, or 2 reciprocated edges. 
For directed 3-cycle cut, we want to find partitions of the 
graph that do not cut many D3Cs. 

4.3 Strongly connected components We now show that 
TSC correctly breaks up strongly connected components 
when using the data tensor in Eqn. (4.11). Suppose we 
have an undirected graph G - (V,E) with two connected 
components Vi and ¥ 2 - A standard result of the spectral 
method for conductance cut on undirected graphs (Sec. 2.3) 
is that there is a second left eigenvector z of P such that 
z^P = z, and sign(z,) = -sign(zy) for i e Vi, j e V 2 
[6]. This means that the ordering induced by the eigenvector 
correctly separates the components. A similar result holds 
for strongly connected components in a directed graph G 
using the directed Laplacian. 

We now present a similar result for directed 3-cycle 
cut. First, we observe the following: there is no directed 
3-cycle that has nodes from different strongly connected 
components. Now, Lemma 4.1 shows that if we have a 
graph with two strongly connected components, then, under 
some conditions, the second left eigenvector computed by 
Algorithm 1 correctly partitions the two strongly connected 
components. 

Lemma 4.1. Consider a directed graph G — (V, E) with two 
components Vi and V 2 such that there are no directed 3- 
cycles containing a node i e Vi and j e V2. Assume 
that the directed 3-cycle tensor T_ is given by Eqn. (4.11). 
Augment the corresponding transition matrices Rk with a 
sink node t so that transition involving j e. Vi, k e V 2 (or 
vice versa) jump to the sink node, i.e., ^(i,j,k) — 1(1 = f). 
Finally, instead of using the dangling distribution vector u 
to fill in F, assume that when 2; r(/, 7, k) = 0 for j,k e Vi, 
^(i,j,k) — 1(1 e V\)I\V\\. (And the same for transitions 
involving j, k e V 2 ). 

Then /’[jt] has a second left eigenvector z with eigen¬ 
value 1 such that zJe — 0 and sign{zi) — —sign(zj) for any 
i ^ Vi, i e V’2. 

Proof. See the full version of the paper. ' 

' Available from https: //github. com/arbenson/tensor- sc. 


5 Applications on synthetic networks 

We now explore applications of our TSC framework. The 
purpose of this section is to illustrate that explicitly parti¬ 
tioning higher-order network data can improve partitioning 
and clustering on directed networks. The examples that fol¬ 
low are small and synthetic but illustrative. In future work, 
we plan to use these ideas on real data sets. 

Lor the applications in this section, we use the following 
parameters for the tensor spectral clustering algorithm: a - 
0.99 for the multilinear PageRank vector, y - 0.01 for SS- 
HOPM, u - V - ^e, and the higher-order conductance score 
function (Lqn. (3.9)). 

5.1 Layered flow networks Our first example is a net¬ 
work consisting of multiple layers, where feedback loops 
primarily occur within a layer. Information tends to flow 
“downwards” from one layer to the next. In other words, 
most edges between two layers point in the same direction. 
Ligure 2 gives an example of such a network with three lay¬ 
ers, each consisting of four nodes. 

We are interested in separating the layers of the network 
via our TSC algorithm. Leedback in a directed network is 
synonymous with directed cycle. Lor this example, we count 
all directed 2-cycles (i.e., reciprocated edges) and directed 
3-cycles. In order to account for the directed 2-cycles, we 
will say that the data tensor T is equal to one for any index 
of the form (/, i, j), (i, J, i), or (j, i, i) when nodes i and j have 
reciprocated edges. Lormally, the data tensor is: 

2 i, j, k distinct and form two D3Cs 
. _ 1 i,j,k distinct and form one D3C 

- ' 1 (k = 7 or k = 0 and (i, j), (j, i) e E 

1 j - i and (/, k), (k, i) e E 

Ligure 2 lists the three communities found by (1) TSC 
(Algorithm 2 with C = 3), (2) the directed Laplacian 
(DL), and (3) the directed Laplacian on the subgraph only 
including edges involved in at least one directed 2-cycle 
or directed 3-cycle (Sub-DL). TSC is the only method that 
correctly identifies the three communities. Sub-DL performs 
almost as well, but misclassifies node 1, placing it with the 
green nodes two layers beneath. In general, DL does not 
do well because there are a large number of edges between 
layers, and the algorithm does not want to cut these edges. 

5.2 Anomaly detection Our second example is anomaly 
detection. In many real networks, most directed 3-cycles 
have at least one reciprocated edge [19]. Thus, a set of 
nodes with many directed 3-cycles and few reciprocated 
edges between them would be highly anamolous. The goal 
of this example is to show that our TSC framework can find 
such sets of nodes when they are planted in a network. 

Ligure 3 shows a network where the anomalous cluster 
we want to identify is nodes 0-5. All triangles between 
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Figure 2: (Left) Layered flow network, where almost 

all feedback occurs at three different layers (specified by 
the blue, red, and green nodes). There are many edges 
going from one layer to the layers below it. (Right) Three 
communities found when using TSC, the directed Laplacian 
(DL), and the directed Laplacian on the subgraph of edges 
participating in at least one directed 2- or 3-cycle (Sub-DL). 
Only TSC correctly identifies all three communities. 


TSC 10,1,2,3,4,5, 

12, 13, 16} 

DL (1,4,5,7,8,12, 

13, 15, 18, 20} 
Sub-DL {0, 1,4, 5,9, 11 

16, 17, 19, 20} 

Figure 3: (Left) Network with planted anomalous clus¬ 
ter (nodes 0-5). Between these nodes, there are many 
directed 3-cycles with no reciprocated edges (thick black 
lines). Nodes 6-22 follow an Erdos-Renyi graph pattern with 
edges indicated by dashed lines. (Right) Smaller of two com¬ 
munities found by TSC, the directed Laplacian (DL), and the 
directed Laplacian on the subgraph with only edges involved 
in a directed 3-cycle with no reciprocated edges (Sub-DL). 
Only TSC finds the entire anomalous cluster. 



nodes 0-5 are directed 3-cycles with no reciprocated edges. 
Nodes 6-21 connect to each other according to a Erdos- 
Renyi model with edge probability 0.25. Einally, nodes 0- 

5 each have four outgoing and two incoming edges with 
nodes 6-21. In total, there are 18 directed 3-cycles with no 
reciprocated edges, and 8 of them occur between nodes 0-5. 

To use the TSC framework, we form a data tensor that 
only counts directed 3-cycles with no reciprocated edges: 

T{i, j, k) = I ((/, j), (j, k), (k, 0 e E, (j, i), (k, j), a, k) i E) 
+ I {{j, 0, {K j), (i, k) e E, (i, j), (j, k), (k, i) i E) 

Figure 3 lists the smaller of the two communities found 
by (1) TSC (Algorithm 2 with C = 2), (2) the directed 
Laplacian (DL), and (3) the directed Laplacian on the sub¬ 
graph only including edges involved in at least one directed 
3-cycle with no reciprocated edges (Sub-DL). We see that 
only TSC correctly captures the planted anomalous commu¬ 
nity. DL does not capture any information about directed 
3-cycles with no reciprocated edges, and hence the cut does 
not make sense in this context. Sub-DL correctly captures 
nodes 0, 1, 4, and 5, but misses nodes 2 and 3. 

6 Directed 3-cycle cuts on large networks 

We now transition to real data sets and show that our tensor 
spectral partitioning algorithm provides good cuts for the 
directed 3-cycle (D3C) data tensor given by Eqn. (4. 1 1). We 
compare the following algorithms: 

• TSC: This is our proposed method (Algorithm 2 with 
C = 2), where the data tensor is given by Eqn. (4.11). 
The sweep cut ordering is provided by the second left 
eigenvector of P[x}. 

• Undirected Laplacian (UL): The sweep cut ordering is 
provided by the second left eigenvector of the transition 
matrix of the undirected version of the graph. 


• Directed Laplacian (DL) [5]: The sweep cut ordering 
is provided by the second left eigenvector of in 
Eqn. (2.4). 

• Asymmetric Laplacian (AL) [4]: The sweep cut or¬ 
dering is provided by the second left eigenvector of P. 

• Co-clustering (Co) [9, 28]: The sweep cut ordering 

is based on the second left and right singular vectors 
of a normalized adjacency matrix. Specifically, let 
J^row - diag(Ae) and Dcoi - diag(A^e) and let U'LV'^ 
be the singular value decomposition of A. 
The the sweep cut ordering is provided by D^ow U(:, 2) 
or 2). We take the better of the two cuts. 

• Random: The sweep cut ordering is random. This 
provides a simple baseline. 

6.1 Data preprocessing Before running partitioning algo¬ 
rithms, we first Alter the networks as follows: (1) remove all 
edges that do not participate in any D3C, and (2) take the 
largest strongly connected component of the remaining net¬ 
work. We perform this Altering to make a fairer comparison 
between the different partitioning algorithms. Table 1 lists 
the relevant networks and statistics for the Altered networks 
that we use in our experiments. We limit ourselves to a few 
representative networks to illustrate the main patterns we ob¬ 
served. Data for more networks is available in the full ver¬ 
sion of this paper. Networks are available from SNAP [23]. 

6.2 Results Eigure 4 shows the sweep proflles on the net¬ 
works in Table 1. The results are for a single cut of the 
network. The plots show the higher-order conductance 
(Eqn. (3.9)), higher-order expansion (Eqn. (3.10)), and den¬ 
sity of the smaller of the partitionined vertex sets. Eor 
email-EuAll and wiki-Talk, higher-order conductance 
is the same for most algorithms, but TSC has much bet- 
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Figure 4: (Top row) Higher-order conductance, 4>3(S), as a function of the smaller vertex partition set (|5|). The size of 
the vertex set runs from twenty to half the nodes in the network. (Middle row) Higher-order expansion, p3(5). (Bottom 
row) Density of the cluster. For email-EuAll and wiki-Talk (left two coumns), higher-order conductance from TSC is 
on par with other spectral methods, and higher-order expansion is better for large enough clusters. For soc-Epinionsl 
and twitter_combined, the higher-order conductance and expansion is better using standard clustering algorithms. In all 
cases, TSC finds much denser clusters. 


Table 1; Statistics of networks used for computing directed 
3-cycle cuts. The statistics are taken on the largest strongly 
connected component of the network after removing all 
edges that do not participate in any D3C. 


Network 

n = |y| 

m — llil 

#D3Cs 

email-EuAll 

11,315 

80,211 

183,836 

soc-Epinionsl 

15,963 

262,779 

738,231 

wiki-Talk 

52,411 

957,753 

5,138,613 

twitter.combined 

57,959 

1,371,621 

6,921,399 


ter higher-order expansion when the vertex set gets large 
enough. On soc-Epinionsl and twitter_combined, 
standard spectral methods have better higher-order conduc¬ 
tance and higher-order expansion. Crucially, in all cases, 
TSC finds much denser subgraphs. In general, we expect 
communities with lots of directed 3-cycles to be dense sets. 
Thus, even though TSC sometimes does not always do well 
with respect to the score metrics discussed in Sec. 3.4, it is 
still finding relevant structure. 

Since our goal is to explore structural properties of 
the cuts, we did not tune our TSC algorithms for high 
performance. Subsequently, we do not compare running 
times of the algorithms. However, we note that for each 
network, our straightforward implementation of TSC ran in 
under 10 minutes using a laptop with 4GB of memory. 


7 Related work 

While the bulk of community detection algorithms are for 
undirected networks, there is still an abundance of meth¬ 
ods for directed networks [25]. There are several spectral 
algorithms related to partitioning directed networks. The 
ones we investigated in this paper were based on the undi¬ 
rected Laplacian (i.e., standard spectral clustering but ignor¬ 
ing edge directions), the directed Laplacian [5], the asym¬ 
metric Laplacian [4], and co-clustering [9, 28]. Other spec¬ 
tral algorithms are based on dyadic methods [24] and opti¬ 
mizing directed modularity [22]. 

There is some work in community detection that explic¬ 
itly targets higher-order structures. Klymko et al. weight 
directed edges in triangles and then revert to a clustering 
algorithm for undirected networks [19]. Clique percola¬ 
tion builds overlapping communities by examining small 
cliques [8]. Optimizing the LinkRank metric can identify 
communities based on information flow [18], which is sim¬ 
ilar to our use of directed 3-cycles in Sec. 5.1. Multi-way 
relationships between nodes are also explicitly handled by 
hypergraph partitioners [17] . 

Finally, we mention that tensor factorizations have been 
used by Huang et al. to find overlapping communities [16]. 
This work uses new spectral techniques for learning latent 
variable models from higher-order moments [2]. 










































8 Discussion 

We have provided a framework for tensor spectral clustering 
that opens the door to further higher-order network analy¬ 
sis. The framework gives the user the flexibility to cluster 
structures based on his or her application. In Sec. 5 we pro¬ 
vided two applications—layered flow networks and anomaly 
detection—that showed how this framework can lead to bet¬ 
ter clustering of nodes based on network motifs. For these 
applications, the networks were small and manually con¬ 
structed. In future work, we plan to explore these applica¬ 
tions on large networks. 

In Sec. 6, we explored clustering based on directed 3- 
cycles. In some cases, TSC provided much better cuts in 
terms of higher-order expansion. Interestingly, for some 
networks, simply removing edges that do not participate in 
a directed 3-cycle and using a standard spectral clustering 
algorithm is sufficient for finding good cuts with respect 
to higher-order conductance and higher-order expansion. 
However, in these cases, we are comparing against baselines 
optimized for our specific problem. That being said, TSC 
does always identify much denser clusters. The networks 
we analyzed were social and internet-based, and it would be 
interesting to see if similar trends hold for networks derived 
from physical or biological systems. 

For the large networks, we did not perform full directed 
clustering—we only investigated the sweep profiles. The 
higher-level goal of this paper is to explore the ideas in 
higher-order clustering, and we leave full-stack algorithms 
to future work. One interesting question for such algorithms 
is whether we should partition based on recursive bisection 
(Algorithm 2) or k-means. These algorithmic variations 
provide several opportunities for challenging future work. 
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9 Proof of Lemma 4.1 

Proof. Without loss of generality, order the nodes so that 
the nodes in Vi are first, the nodes V 2 are second, and the 
sink node t is last. Let r!'^ be the transition probabilities of 
Rk, restricted to nodes in Vi, i = 1,2. Because nodes from 
two different strongly connected components cannot be in 
the same directed 3-cycle, 


Rk = R^^ 

0 

0 



0 

0 

0 

0 

0 

0 

,k e Vi; 

Rk^ 

0 

Rf 

0 

0 

cT 

1 



cT 

0 

1 


Here, the first block diagonal matrix is of size IVil x IVil, the 
second of size 11721 x IV2I, and the third of size 1x1. Consider 
the following vectors that have the same block structure: 

= K ® 5^2 = [0 l]- 

Then yjRk - yi, i - 1,2 for k &V\ and k e V2. Thus, 
y]P{x] = ^ {xkRk) = ^ xky] = y] 

k k 

The vector z - y\ - satisfies zJP[x] - 7 ^ and 

zJe - 0. Since e^P[x} - e^, z is a second left eigenvector. 
Finally, z, is positive for all i e Vi and negative for all i e ¥ 2 - 








10 Directed 3-cycle cuts on more networks 

Here we present the results of Sec. 6 on more networks. Ta¬ 
ble 2 lists the statistics of eleven networks that we consider. 
We include one undirected network, email-Enron. For this 
data set, all undirected edges are simply replaced with two 
directed edges. Figures 5, 6, and 7 show the sweep profiles 
for higher-order conductance, higher-order expansion, and 
density, respectively. 


Table 2; Statistics of networks used for computing directed 
3-cycle cuts. The statistics are taken on the largest strongly 
connected component of the network after removing all 
edges that do not participate in any directed 3-cycle. 


Network 

n = |y| 

m — IFI 

#D3Cs 

wiki-Vote 

1,151 

24,349 

43,975 

wiki-R£A 

2,219 

61,965 

133,004 

as-caida2®871185 

8,320 

50,016 

72,664 

email-EuAll 

11,315 

80,211 

183,836 

web-Stanford 

12,520 

105,376 

212,639 

soc-Epinionsl 

15,963 

262,779 

738,231 

soc-Slashdot®811 

22,193 

377,172 

883,884 

email-Enron 

22,489 

332,396 

1,447,534 

wiki-Talk 

52,411 

957,753 

5,138,613 

twitter_combined 

57,959 

1,371,621 

6,921,399 

amazon®312 

253,405 

1,476,377 

1,682,909 
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Figure 5: Directed 3-cycle higher-order conductance (cf> 3 (S ), Eqn. (3.9)) as a function of the smaller partition size, |51. The 
size runs from twenty nodes to half the nodes in the network. 
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Figure 6: Directed 3-cycle higher-order expansion (p3(S), Eqn. (3.9)) as a function of the smaller partition size, |51. The 
size runs from twenty nodes to half the nodes in the network. 
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Figure 7: Density of the smaller partition set of vertices as a function of its size, |51. The size runs from twenty nodes to 
half the nodes in the network. In nearly all cases, TSC finds the densest clusters. 





























